134 research outputs found

    The effectiveness of loop unrolling for modulo scheduling in clustered VLIW architectures

    Get PDF
    Clustered organizations are becoming a common trend in the design of VLIW architectures. In this work we propose a novel modulo scheduling approach for such architectures. The proposed technique performs the cluster assignment and the instruction scheduling in a single pass, which is shown to be more effective than doing first the assignment and later the scheduling. We also show that loop unrolling significantly enhances the performance of the proposed scheduler especially when the communication channel among clusters is the main performance bottleneck. By selectively unrolling some loops, we can obtain the best performance with the minimum increase in code size. Performance evaluation for the SPECfp95 shows that the clustered architecture achieves about the same IPC (Instructions Per Cycle) as a unified architecture with the same resources. Moreover when the cycle time is taken into account, a 4-cluster configurations is 3.6 times faster than the unified architecture.Peer ReviewedPostprint (published version

    Modulo scheduling for a fully-distributed clustered VLIW architecture

    Get PDF
    Clustering is an approach that many microprocessors are adopting in recent times in order to mitigate the increasing penalties of wire delays. We propose a novel clustered VLIW architecture which has all its resources partitioned among clusters, including the cache memory. A modulo scheduling scheme for this architecture is also proposed. This algorithm takes into account both register and memory inter-cluster communications so that the final schedule results in a cluster assignment that favors cluster locality in cache references and register accesses. It has been evaluated for both 2- and 4-cluster configurations and for differing numbers and latencies of inter-cluster buses. The proposed algorithm produces schedules with very low communication requirements and outperforms previous cluster-oriented schedulers.Peer ReviewedPostprint (published version

    Fast, accurate and flexible data locality analysis

    Get PDF
    This paper presents a tool based on a new approach for analyzing the locality exhibited by data memory references. The tool is very fast because it is based on a static locality analysis enhanced with very simple profiling information, which results in a negligible slowdown. This feature allows the tool to be used for highly time-consuming applications and to include it as a step in a typical iterative analysis-optimization process. The tool can provide a detailed evaluation of the reuse exhibited by a program, quantifying and qualifying the different types of misses either globally or detailed by program sections, data structures, memory instructions, etc. The accuracy of the tool is validated by comparing its results with those provided by a simulator.Peer ReviewedPostprint (published version

    Flexible compiler-managed L0 buffers for clustered VLIW processors

    Get PDF
    Wire delays are a major concern for current and forthcoming processors. One approach to attack this problem is to divide the processor into semi-independent units referred to as clusters. A cluster usually consists of a local register file and a subset of the functional units, while the data cache remains centralized. However, as technology evolves, the latency of such a centralized cache increase leading to an important performance impact. In this paper, we propose to include flexible low-latency buffers in each cluster in order to reduce the performance impact of higher cache latencies. The reduced number of entries in each buffer permits the design of flexible ways to map data from L1 to these buffers. The proposed L0 buffers are managed by the compiler, which is responsible to decide which memory instructions make us of them. Effective instruction scheduling techniques are proposed to generate code that exploits these buffers. Results for the Mediabench benchmark suite show that the performance of a clustered VLIW processor with a unified L1 data cache is improved by 16% when such buffers are used. In addition, the proposed architecture also shows significant advantages over both MultiVLIW processors and clustered processors with a word-interleaved cache, two state-of-the-art designs with a distributed L1 data cache.Peer ReviewedPostprint (published version

    A unified modulo scheduling and register allocation technique for clustered processors

    Get PDF
    This work presents a modulo scheduling framework for clustered ILP processors that integrates the cluster assignment, instruction scheduling and register allocation steps in a single phase. This unified approach is more effective than traditional approaches based on sequentially performing some (or all) of the three steps, since it allows optimizing the global code generation problem instead of searching for optimal solutions to each individual step. Besides, it avoids the iterative nature of traditional approaches, which require repeated applications of the three steps until a valid solution is found. The proposed framework includes a mechanism to insert spill code on-the-fly and heuristics to evaluate the quality of partial schedules considering simultaneously inter-cluster communications, memory pressure and register pressure. Transformations that allow trading pressure on a type of resource for another resource are also included. We show that the proposed technique outperforms previously proposed techniques. For instance, the average speed-up for the SPECfp95 is 36% for a 4-cluster configuration.Peer ReviewedPostprint (published version

    Dummy regression analysis for modelling the nutritionally tailored fillet fatty acid composition of turbot and sole using gilthead sea bream as a reference subgroup category

    Get PDF
    Farmed turbot and sole were sampled at different stages of the production cycle for analysis of fillet lipid content and fatty acid (FA) composition. The entire data set along with our own published data on gilthead sea bream were fitted to dummy regression equations with turbot and sole as dummy variables, gilthead sea bream as a reference subgroup category, and diet FA composition and fillet lipid content as independent variables. The relative contribution of each independent variable to the total variance was found to vary within and among FAs and fish species, but strong correlation coefficients (0.76 0.99) were found for almost all of the FA equations, including saturated FAs, monoenes and long-chain polyunsaturated fatty acids (PUFA) of n-3 and n-6 series. Given the differences in lipogenic activities of the fish species, major interaction effects between fillet lipid content and dummy variables were found for monoenes and saturated FAs. The proposed equations (hosted at www.nutrigroup-iats.org/aquafat) were able to fit different proportions of EPA, DPA and DHA underlying the fish species differences in FA desaturation/elongation pathways. The robustness of the model was proven with extra data from the three fish species, allowing a close linear association near to equality for the scatter plot of observed and predicted values. © 2014 John Wiley & Sons Ltd.This study was funded by Spanish (AQUAFAT, AGL2009-07797, Predictive modelling of flesh fatty acid composition in cultured fish species with different muscle lipid content; AQUAGENOMICS, CSD2007-00002, Improvement of aquaculture production by the use of biotechnological tools) and EU (ARRAINA, KBBE-2011-5-288925, Advanced research initiatives for nutrition and aquaculture) projects. Additional funding was obtained from the ‘Generalitat Valenciana’ (research grant PROMETEO 2010/006). GFB-L was recipient of a Spanish PhD fellowship from the Diputación Provincial de Castellón.Peer Reviewe

    AGAMOS: A graph-based approach to modulo scheduling for clustered microarchitectures

    Get PDF
    This paper presents AGAMOS, a technique to modulo schedule loops on clustered microarchitectures. The proposed scheme uses a multilevel graph partitioning strategy to distribute the workload among clusters and reduces the number of intercluster communications at the same time. Partitioning is guided by approximate schedules (i.e., pseudoschedules), which take into account all of the constraints that influence the final schedule. To further reduce the number of intercluster communications, heuristics for instruction replication are included. The proposed scheme is evaluated using the SPECfp95 programs. The described scheme outperforms a state-of-the-art scheduler for all programs and different cluster configurations. For some configurations, the speedup obtained when using this new scheme is greater than 40 percent, and for selected programs, performance can be more than doubled.Peer ReviewedPostprint (published version

    Exploiting pseudo-schedules to guide data dependence graph partitioning

    Get PDF
    This paper presents a new modulo scheduling algorithm for clustered microarchitectures. The main feature of the proposed scheme is that the assignment of instructions to clusters is done by means of graph partitioning algorithms that are guided by a pseudo-scheduler. This pseudo-scheduler is a simplified version of the full instruction scheduler and estimates key constraints that would be encountered in the final schedule. The final scheduling process is bi-directional and includes on-the-fly spill code generation. The proposed scheme is evaluated against previous scheduling approaches using the SPECfp95 benchmark suite. Our modeling results show that better schedules are obtained for most programs across a range of different architectures. For a 4-cluster VLIW architecture with 32 registers and a 2-cycle inter-cluster communication delay we obtain an average speedup of 38.5%.Peer ReviewedPostprint (published version

    Furfural, 5-HMF, acid-soluble lignin and sugar contents in C. ladanifer and E. arborea lignocellulosic biomass hydrolysates obtained from microwave-assisted treatments in different solvents

    Get PDF
    Cistus ladanifer L. and Erica arborea L. are the two most representative shrub species from the Iberian Peninsula. With a view to their valorization, their biomass hydrolysate components, obtained from microwave-assisted treatments with choline chloride/urea - HNO3 10%, N,N-dimethylacetamide/NaHCO3 and N,N-dimethylacetamide/CH3OK as solvents, have been measured using a spectrophotometric method. Concentrations of furfural and 5-(hydroxymethyl)furfural (5-HMF) in the filtrate have been determined after reduction with NaBH4. The production of total sugars, reducing sugars and non-reducing sugars has also been assessed. The obtained results support the choice of microwave-assisted choline chloride/urea deep eutectic solvent in acid media as the preferred method (over the polar aprotic solvent-based alternatives) for the extraction of lignin, furfural, 5-HMF and sugars from C. ladanifer and E. arborea biomass, attaining the best production yields for 60¿min exposure times. Another is the case if the aim of the treatments is to recovery sugars from both shrubs for subsequent enzymatic saccharification: the very low 5-HMF contents resulting from the dimetylacetamide systems (especially is association with CH3OK) make them highly advantageous as compared to the traditional method using NaOH

    Crystallinity of cellulose microfibers derived from Cistus ladanifer and Erica arborea shrubs

    Get PDF
    The effectiveness of the use of cellulose fibers as particulates/composite reinforcers involves the assessment of the crystallinity of such fibers. The aim of the present work is to provide information on the degree of crystallinity of the cellulose microfibers obtained from the stems of Cistus ladanifer and Erica arborea shrubs through two different methods, namely an alkaline treatment and a microwave-assisted deep eutectic solvent (DES) method. The crystallinity indexes (CrI) obtained from X-ray powder diffraction patterns indicated that higher CrI were attained for cellulose obtained from the DES treatment. Complementary information on the degree of crystallinity was also retrieved from attenuated total reflection- Fourier transform infrared spectroscopy (ATR-FTIR) vibrational spectra, scanning electron microscopy (SEM) micrographs, and accessibility data for the DES-treated celluloses from the two species. The crystallinity results for the fibers derived from these two Mediterranean shrubs were within the range of the results for those derived from wood pulp, opening the door to their valorization for cellulose-derived packing applications or for their use as reinforcers in composite materials in combination with other biopolymers
    • …
    corecore